Diploid Alignment is NP-hard
نویسندگان
چکیده
Human genomes consist of pairs of homologous chromosomes, one of which is inherited from the mother and the other from the father. Species, such as human, with pairs of homologous chromosomes are called diploid. Sequence analysis literature is, however, almost solely built under the model of a single haplotype sequence representing a species. This fundamental choice is apparently due to the huge conceptual simplification of carrying out analyses over sequences rather than over pairs of related sequences. In this paper, we show that not only raising the abstraction level creates conceptual difficulties, but also the computational complexity will change for a natural non-trivial extension of optimal alignment to diploids. As of independent interest, our approach can also be seen as an extension of sequence alignment to labelled directed acyclic graphs (labeled DAGs). Namely, we show that a covering alignment of two labeled DAGs is NP-hard. A covering alignment is to find two paths P1(A) and P2(A) in DAG A and two paths P1(B) and P2(B) in DAG B that cover the nodes of the graphs and maximize sum of the global alignment scores: S(`(P1(A)), `(P1(B))) + S(`(P2(A)), `(P2(B))), where `(P ) is the concatenation of labels on the path P . Pair-wise alignment of haplotype sequences forming a diploid chromosome can be converted to a two-path coverable labelled DAG, and then the covering alignment models the similarity of two diploids over arbitrary recombination. ar X iv :1 61 1. 05 08 6v 1 [ cs .C C ] 1 5 N ov 2 01 6
منابع مشابه
Spi Alignment and Distribution Is Not (always) Np-hard Alignment and Distribution Is Not (always) Np-hard Alignment and Distribution Is Not (always) Np-hard
In this paper, an eecient algorithm to simultaneously implement array alignment and data/computation distribution is introduced and evaluated. We re-visit previous work of Li and Chen 13, 14], and we show that their alignment step should not be conducted without preserving the potential parallelism. In other words, the optimal alignment may well sequentialize computations, whatever the distribu...
متن کاملSettling the Intractability of Multiple Alignment
Multiple alignment is a core problem in computational biology that has received much attention over the years, both in the line of heuristics and hardness results. In most expositions of the problem it is referred to as NP-hard and references are given to one of the available hardness results. However, previous to this paper not even the most elementary variation of the problem, multiple alignm...
متن کاملOn the Complexity of Multiple Sequence Alignment
We study the computational complexity of two popular problems in multiple sequence alignment: multiple alignment with SP-score and multiple tree alignment. It is shown that the first problem is NP-complete and the second is MAX SNP-hard. The complexity of tree alignment with a given phylogeny is also considered.
متن کاملComplexity of Multiple Sequence Alignment
It is shown that the multiple alignment problem with SP-score is NP-hardforeachscoringmatrixinabroadclassM that includes most scoring matrices actually used in biological applications. The problem remains NP-hard even if sequences can only be shifted relative to each other and no internal gaps are allowed. It is also shown that there is a scoring matrix M 0 such that the multiple alignment prob...
متن کاملThe Complexity of Phrase Alignment Problems
Many phrase alignment models operate over the combinatorial space of bijective phrase alignments. We prove that finding an optimal alignment in this space is NP-hard, while computing alignment expectations is #P-hard. On the other hand, we show that the problem of finding an optimal alignment can be cast as an integer linear program, which provides a simple, declarative approach to Viterbi infe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1611.05086 شماره
صفحات -
تاریخ انتشار 2016